InternVL3-38B is an advanced multimodal large language model that excels in multimodal perception, reasoning, and other capabilities. It shows significant improvements compared to previous models and also expands multimodal capabilities such as tool use and GUI agents.
Text-to-Image
Transformers Other